Skip to content

Conversation

jdries
Copy link
Contributor

@jdries jdries commented Jun 5, 2025

Open-EO/openeo-geopyspark-driver#1111

TODO:

  • changelog entry
  • based on example items dict below, construct a unit test, check if it returns valid STAC 1.1

Example item entry:

{
 "5d2db643-5cc3-4b27-8ef3-11f7d203b221_2023-12-31T21:41:00Z": {
      "geometry": {
        "coordinates": [
          [
            [
              3.359808992021044,
              51.08284561357965
            ],
            [
              3.359808992021044,
              51.88641704215104
            ],
            [
              4.690166134878123,
              51.88641704215104
            ],
            [
              4.690166134878123,
              51.08284561357965
            ],
            [
              3.359808992021044,
              51.08284561357965
            ]
          ]
        ],
        "type": "Polygon"
      },
      "assets": {
        "openEO": {
          "datetime": "2023-12-31T21:41:00Z",
          "roles": [
            "data"
          ],
          "bbox": [
            3.359808992021044,
            51.08284561357965,
            4.690166134878123,
            51.88641704215104
          ],
          "geometry": {
            "coordinates": [
              [
                [
                  3.359808992021044,
                  51.08284561357965
                ],
                [
                  3.359808992021044,
                  51.88641704215104
                ],
                [
                  4.690166134878123,
                  51.88641704215104
                ],
                [
                  4.690166134878123,
                  51.08284561357965
                ],
                [
                  3.359808992021044,
                  51.08284561357965
                ]
              ]
            ],
            "type": "Polygon"
          },
          "href": "s3://openeo-data-staging-waw4-1/batch_jobs/j-250605095828442799fdde3c29b5b047/openEO_20231231T214100Z.tif",
          "nodata": "nan",
          "type": "image/tiff; application=geotiff",
          "bands": [
            {
              "name": "LST",
              "common_name": "surface_temperature",
              "aliases": [
                "LST_in:LST"
              ]
            }
          ],
          "raster:bands": [
            {
              "name": "LST",
              "statistics": {
                "valid_percent": 66.88,
                "maximum": 281.04800415039,
                "stddev": 19.598456945276,
                "minimum": 224.46798706055,
                "mean": 259.57087672984
              }
            }
          ]
        }
      },
      "id": "5d2db643-5cc3-4b27-8ef3-11f7d203b221_2023-12-31T21:41:00Z",
      "properties": {
        "datetime": "2023-12-31T21:41:00Z"
      },
      "bbox": [
        3.359808992021044,
        51.08284561357965,
        4.690166134878123,
        51.88641704215104
      ]
    }
}

@jdries jdries marked this pull request as draft June 5, 2025 13:06
@bossie bossie linked an issue Sep 11, 2025 that may be closed by this pull request
bossie added a commit that referenced this pull request Sep 12, 2025
…TAC items"

This reverts commit 1109ec4.

This added top-level (Collection) links to item-level links implicitly so not right conceptually.
Support dedicated derived_from document only for "STAC 1.1 items" (implemented in #411).

Open-EO/openeo-geopyspark-driver#1278
Copy link
Collaborator

@bossie bossie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick review; added some points of interest inline.

I can help you with adding tests to test_views.py for these paths:

  • /jobs/<job_id>/results
  • /jobs/<job_id>/results/items11/<item_id>

and/or their signed counterparts.

ml_model_metadata = metadata

if len(result_metadata.items) > 0 :
for item_id, metadata in result_metadata.items.items():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for item_id, metadata in result_metadata.items.items():
for item_id in result_metadata.items.keys():

Comment on lines +1396 to +1397
if item_id == DriverMlModel.METADATA_FILE_NAME:
return _download_ml_model_metadata(job_id, item_id, user_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this ever be the case?

Copy link
Collaborator

@bossie bossie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some test skeletons to complete.

Ideally the URLs have the original /items instead of /items11.

Let me know if anything is not clear.

method_start = method_start + "11"
if not signer:
return url_for(".get_job_result_item", job_id=job_id, item_id=item_id, _external=True)
return url_for(method_start, job_id=job_id, item_id=item_id, _external=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to avoid a new endpoint (/jobs/<job_id>/results/items11/<item_id>)? I suppose the original _get_job_result_item method can also check whether the item ID corresponds to either:

  • an item derived from an asset (old case) or;
  • an actual item generated in the batch job (new case aka stac11).

Copy link
Collaborator

@bossie bossie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some wrong assertions in the tests; fixing these assertions will also mean fixing the actual code.

I suggest we go over this together tomorrow.

'id': '5d2db643-5cc3-4b27-8ef3-11f7d203b221_2023-12-31T21:41:00Z',
'properties': {'datetime': '2023-12-31T21:41:00Z'}
}
assert res.items.get(item_id) == resp_data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary. What matters is: the response to your item request matches the dummy item that you put in there in the first place.

f"/jobs/{job_id}/results/items11/{item_id}", headers=self.AUTH_HEADER
)

resp_data = resp.assert_status_code(200).json
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This JSON response should be a valid STAC item: at least type, stac_version and links are missing. You can find the rules here: https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md

This is probably where the confusion stems from. The item that you get back from the call to get_result_metadata is incomplete so you'll have to augment it with some things (~ _get_job_result_item)

[3.359808992021044, 51.08284561357965]
]],
'type': 'Polygon'},
'href': 's3://openeo-data-staging-waw4-1/batch_jobs/j-250605095828442799fdde3c29b5b047/openEO_20231231T214100Z.tif',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This asset URL/URI is what the client actually receives and what he will use to read the asset from. In its current form, he can't do anything with it because he needs an endpoint as well as the proper credentials to read from this internal S3 URI.

That's why we translate this internal URI to something that the client can actually use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unify asset keys across STAC items
3 participants